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ABSTRACT 



A general purpose, programmable media processor for pro- 
cessing and transmitting a media data stream of audio, 
video, radio, graphics, encryption, authentication, and net- 
working infonnation in real-time. The media processor 
incorporates an execution unit that maintains substantially 
peak data throughout of media data streams. The execution 
unit includes a dynamically partionable multi-precision 
arithmetic unit, programmable switch and programmable 
extended mathematical element. A high bandwidth external 
interface supplies media data streams at substantially peak 
rates to a general purpose register file and the multi- 
precision execution unit. A memory management unit, and 
instruction and data cache/buffers are also provided. High 
bandwidth memory controllers are hhked in series to pro- 
vide a memory channel to the general purpose, program- 
mable media processor. The general purpose, programmable 
media processor is disposed in a network fabric consisting of 
fiber optic cable, coaxial cable and twisted pair wires to 
transmit, process and receive single or unified media data 
streams. Parallel general purpose media processors are dis- 
posed throughout the network in a distributed virtual manner 
to allow for multi-processor operations and sharing of 
resources through the network. A method for receiving, 
processing and transmitting media data streams over the 
communications fabric is also provided. 

8 Claims, 25 Drawing Sheets 
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FIG. 8(c) 
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FIG. 9(a) 
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FIG. 15 
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FIG. 18(a) 
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GENERAL PURPOSE, MULTIPLE 
PRECISION PARALLEL OPERATION, 
PROGRAMMABLE MEDIA PROCESSOR 

This is a divisional of application Sen No. 08/516,036, 
filed Aug. 16, 1995, now U.S. Pal. No. 5,742,840. 

A Microfiche Appendix consisting of 4 sheets (387 total 
frames) of microfiche is included in this application. The 
Microfiche Appendix contains material which is subject to 
copyright protection. The copyright owner has no objection 
to the facsimile reproduction by any one of the Microfiche 
Appendix, as it appears in the Patent and Trademark OfBce 
patent files or records, but otherwise reserves all copyright 
rights whatsoever 

FIELD OF THE INVENTION 

This invention relates to the field of communications 
processing, and more particularly, to a method and apparatus 
for real-time processing of multi-media digital communica- 
tions. 

BACKGROUND OF THE INVENTION 

Optical fiber and discs have made the transmission and 
storage of digital information both cheaper cmd easier than 
older analog technologies. An improved system for digital 
processing of media data streams is necessary in order to 
realize the full potential of these advanced media. 

For the past century, telephone service delivered over 
copper twisted pair has been the lingua franca of commu- 
nications. Over the next century, broadband services deliv- 
ered over optical fiber and coax will more completely fulfill 
the human need for sensory information by supplying voice, 
video, and data at rates of about 1,000 limes greater than 
narrow band telephony. Current general -purpose micropro- 
cessors and digital signal processors ("DSPs") can handle 
digital voice, data, and images at narrow band rates, but they 
are way too slow for processing media data at broadband 
rates. 

This shortfall in digital processing of broadband media is 
currently being addressed through the design of many dif- 
ferent kinds of application-specific integrated circuits 
("ASICs"). For example, a .prototypical broadband device 
such as a cable modem modulates and demodulates digital 
data at rates up to 45 Mbits/sec within a single 6 MHZ cable 
channel (as compared to rates of 28.8 Kbits/sec within a 6 
KHz channel for telephone modems) and transcodes it onto 
a 10/100 baseT connection to a personal computer ("PC") or 
workstation. Current cable modems thus receive data from 
a coaxial cable connection through a chain of specialized 
ASIC devices in order to accomplish Quadrature Amplitude 
Modification ("QAM") demodulation, Reed-Solomon error 
correction, packet filtering, Data Encryption Standard 
("DES") decryption, and Ethernet protocol handling. The 
cable modems also transmit data to the coaxial cable link 
through a second chain of devices to achieve DES 
encryption, Reed-Solomon block encoding, and Quaternary 
Phase Shift Keying ("QPSK") modulation. In these 
environments, a general-purpose processor is usually 
required as well in order to perform initialization, statistics 
collection, diagnostics, and network management functions. 

The ASIC approach to media processing has three fun- 
damental flaws: cost, complexity, and rigidity. The com- 
bined silicon area of aU the specialized ASIC devices 
required in the cable modem, for example, results in a 
component cost incompatible with the per subscriber price 
target for a cable service. The cable plant itself is a very 



hostile service environment, with noise ingress, reflections, 
nonlinear amplifiers, and other channel impairments, espe- 
cially when viewed in the upstream direction. Telephony 
modems have developed an elaborate hierarchy of algo- 

5 rithms implemented in DSP software, with automatic reduc- 
tion of data rates from 28.8 Kbits/sec to 19.6 Kbits/sec, 14.4 
Kbits/sec, or much lower rates as needed to accommodate 
noise, echoes, and other impairments in the copper plant. To 
implement similar algorithms on an ASIC-based broadband 
modem is far more complex to achieve in software. 

These problems of cost, complexity, and rigidity are 
compounded further in more complete broadband devices 
such as digital set-top boxes, multimedia PCs, or video 
conferencing equipment, aU of which go beyond the basic 

35 radio frequency ("RF") modem functions to include a broad 
range of audio and video compression and decoding 
algorithms, along with remote control and graphical user 
interfaces. Software for these devices must control what 
amounts to a heterogeneous multi-processor, where each 

20 specialized processor has a different, and usually eccentric 
or primitive, programming environment. Even if these pro- 
gramming environments are mastered, the degree of pro- 
grammabihty is limited. For example, Motion Picture Expert 
Group-I C'MPEG-I") chips manufactured by AT&T Corpo- 

25 ration will not implement advances such as fractal- and 
wavelet-based compression algorithms, but these chips are 
not readily software upgradeable to the MPEG-II standard. 
A broadband network operator who leases an MPEG ASIC- 
based product is therefore at risk of having to continuously 

30 upgrade his system by purchasing significant amounts of 
new hardware just to track the evolution of MPEG stan- 
dards. 

The high cost of ASIC-based media processing results 
from inefiiciencies in both memory and logic. A typical 

35 ASIC consists of a multiplicity of specialized logic blocks, 
each with a small memory dedicated to holding the data 
which comprises the working set for that block. The silicon 
area of these multiple smaU memories is further increased by 
the overhead of multiple decoders, sense amplifiers, write 

40 drivers, etc. required for each logic block. The logic blocks 
are also constrained to operate at frequencies determined by 
the internal symbol rates of broadband algorithms in order to 
avoid additional buffer memories. These frequencies typi- 
cally differ from the optimum speed- area operating point of 

45 a given semiconductor technology. Interconnect and syn- 
chronization of the many logic and memory blocks are also 
major sources of overhead in the ASIC approach. 

The disadvantages of the prior ASIC approach can be over 
come by a single unified media processor. The cost advan- 

50 tages of such a unified processor can be achieved by 
gathering all the many ASIC functions of a broadband media 
product into a single integrated circuit. Cost reduction is 
further increased by reducing the total memory area of such 
a circuit by replacing the multiplicity of small ASIC memo- 

55 ries with a single memory hierarchy large enough to accom- 
modate the sum total of all the working sets, and wide 
enough to supply the aggregate bandwidth needs of all the 
logic blocks. Additionally, the logic block interconnect 
circuitry to this memory hierarchy may be streamlined by 

60 providing a generally programmable switching fabric. Many 
of the logic blocks themselves can also replaced with a 
single multi-precision arithmetic unit, which can be inter- 
nally partitioned under software control to perform addition, 
multiplication, division, and other integer and floating point 

65 arithmetic operations on symbol streams of varying widths, 
while sustaining the full data throughput of the memory 
hierarchy. The residue of logic blocks that perform opera- 



03/24/2004, EAST Version: 1.4.1 



5,809,321 

3 4 

tions that are neither arithmetic or permutation group ori- in the execution and transmission of multiple media data 

ented can be replaced with an extended math unit that streams. The system includes in one aspect a general 

supports additional arithmetic operations such as finite field, purpose, programmable media processor, and in another 

ring, and table lookup, while also sustaining the full data aspect includes a method for receiving, processing and 

throughput of the memory hierarchy. s transmitting media data streams. The general purpose, pro- 

The above multi-precision arithmetic, permutation grammable media processor of the invention further 

switch, and extended math operations can then be organized includes an execution unit, high bandwidth external 

as machine instructions that transfer their operands to and interface, and can be employed in a parallel multi-processor 

from a single wide multi-ported register file. These instruc- system. 

tions can be further supplemenled with load/store instruc- According to the apparatus of the invention, an execution 

tions that transfer register data to and from a data buffer/ unit is provided that maintains substantially peak data 

cache static random access memory ("SRAM") and main throughput in the unified execution of multiple media data 

memory dynamic random access memories ("DRAMs"), streams. The execution unit includes a data path, and a 

and with branch instructions that control the flow of instruc- multi-precision arithmetic unit coupled to the data path and 

tions executed from an instruction buffer/cache SRAM. capable of dynamic partitioning based on the elemental 

Extensions to the load/store instrucUons can be made for widthof data received from the data path. The execution unit 

synchromzation, and to branch mstrucUons for pr^ ^^^^^^ ^ ^^^^^ ^^^^ ^^^^ 

gateways, so that multiple threads of execution for audio, li* -1*1* .uj* 

' J. *■ * 1 • * nr • *i J programmable to mampulate data received from the data 

video, radio, encryption, networking, etc. can cmciently and . -j j . 5 . j . .i. * ^ j j 

securely share memory and logic resources of a unified P^^^"" "^'^^ ^- "^^"^ l^^^' ^ ^"^T^^"^ 
machine operating near the optimum speed-area point of the 20 mathematical element is also provided, which is coupled to 
target semiconductor process. The data path for such a ^^^^ P^"^ programmable to implement additional 
unified media processor can interface to a high speed mathematical operations at substantially peak data through- 
input/output ("I/O") subsystem that moves media streams P^'- 1" a preferred embodiment of the execution unit, at least 
across ultra-high bandwidth interfaces to external storage one register file is coupled to the data path, 
and I/O. 25 According to another aspect of the invention, a general 

Such a device would incorporate all of the processing purpose programmable media processor is provided having 

capabilities of the specialized multi-ASIC combination into an instruction path and a data path to digitally process a 

a single, unified processing device. The unified processor plurality of media data streams. The media processor 

would be agile and capable of reprogramming through the includes a high bandwidth external interface operable to 

transmission of new programs over the communication 3Q receive a plurality of data of various sizes from an external 

medium. This programmable, general purpose device is thus source and communicate the received data over the data path 

less costly than the speciahzed processor combination, at a rate that maintains substantially peak operation of the 

easier to operate and reprogram and can be installed or media processor. At least one register file is included, which 

applied in many differmg devices and situations. The device configurable to receive and store data from the data path 

may also be scalable to communications appHcations that communicate the stored data to the data path. A 

support vast numbers of users through massively parallel ^nulii-VT^ision execution unit is coupled to the data path 

distributed computmg. ^ dynamically configurable to partition data received 

It is therefore an object of this invention to process media f^^^^ the data path to account for the elemental symbol size 

data streams by executing operations at very high bandwidth of ^je pluraUty of media streams, and is programmable to 

40 Operate on the data to generate a unified symbol output to the 

It is also an object of this invention to tmify the audio, data path, 

video, radio, graphics, encryption, authentication, and net- According to the preferred embodiment of the media 

working protocols into a single instruction stream. processor, means are included for moving data between 

It is also an object of this invention to achieve high registers and memory by performing load and store 
bandwidth rates in a unified processor that is easy to 45 operations, and for coordinating the sharing of data among 
program and more flexible than a heterogeneous combina- a plurality of tasks by performing synchronization opera- 
tion of special purpose processors. tions based upon instructions and data received by the 

It is a further object of the invention to support high level execution unit. Means are also provided for securely con- 
mathematical processing in a unified media processor, trolling the sequence of exectition by performing branch and 
including finite group, finite field, finite ring and table 50 gateway operations based upon instructions and data 
look-up operations, all at high bandwidth rates. received by the execution unit. A memory management unit 

It is yet a further object of the invention to provide a operable to retrieve data and instructions for timely and 

unified media processor that can be replicated into a multi- secure communication over the data path and instruction 

processor system to support a vast array of users. path respectively is also preferably included in the media 

It is yet another object of this invention to allow for 55 processor. The preferred embodiment also includes a com- 

massively parallel systems within the switching fabric to twined instruction cache and buffer that is dynamically allo- 

support very large numbers of subscribers and services. cated between cache space and buffer space to ensure 

It is also an object of the invention to provide a general real-time execution of multiple media instruction streams, 

purpose programmable processor that could be employed at ^ combined data cache and buffer that is dynamically 

all points in a network. 60 allocated between cache space and buffer space to ensure 

It is a further object of this invention to sustain very high response for multiple media data streams, 
bandwidth rates to arbitraily large memory and input/output 1° aiKither aspect of the invention, a high bandwidth 
systems. processor interface for receiving and transmitting a media 
oTTxjiuADv ni7 xLin iMvcNmnM provided having a data path operable to transmit 
SUMMARY OF THE INVENTION ^5 ^^^^ information at sustained peak rates. The high band- 
In view of the above, there is provided a system for media width processor interface includes a plurality of memory 
processing that maintains substantially peak data throughput controllers coupled in series to communicate stored media 
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information to and from the data path, and a plurality of 
memory elements coupled in parallel to each of the plurality 
of memory controllers for storing and retrieving the media 
information. In the preferred embodiment of the high band- 
width processor interface, the plurality of memory coritrol- s 
lers each comprise a paired link disposed between each 
memory controller, where the paired links each transmit and 
receive plural bits of data and have differential data inputs 
and outputs and a differential clock signal. 

Yet another aspect of the invention includes a system for 
unified media processing having a plurality of general 
purpose media processors, where each media processor is 
operable at substantially peak data rates and has a dynami- 
cally partitioned execution unit and a high bandwidth inter- 
face for communicating to memory and input/output ele- 
ments to supply data to the media processor at substantially 
peak rates. A bi-directional communication fabric is 
provided, to which the pluraUty of media processors are 
coupled, to transmit and receive at least one media stream 
comprising presentation, transmission, and storage media 
information. The bi-directional communication fabric pref- 20 
crably comprises a fiber optic network, and a subset of the 
plurality of media processors comprise network servers. 

According to yet another aspect of the invention, a 
parallel multi-media processor system is provided having a 
data path and a high bandwidth external interface coupled to 25 
the data path and operable to receive a plurality of data of 
various sizes from an external source and communicate the 
received data at a rate that maintains substantially peak 
operation of the parallel multi-processor system. A plurality 
of register files, each having at least one register coupled to 30 
the data path and operable to store data, are also included. At 
least one multiprecision execution unit is coupled to the data 
path and is dynamically configurable to partition data 
received from the data path to account for the elemental 
symbol size of the plurality of media streams, and is 35 
programmable to operate in parallel on data stored in the 
plurality of register files to generate a unified symbol output 
for each register file. 

According to the method of the invention, unified streams 
of media data are processed by receiving a stream of unified 4Q 
media data including presentation, transmission and storage 
information. The unified stream of media data is dynami- 
cally partitioned into component fields of at least one bit 
based on the elemental symbol size of data received. The 
unified stream of media data is then processed at substan- 45 
tially peak operation. 

In one aspect of the invention, the unified stream of media 
data is processed by storing the stream of unified media data 
in a general register file. Multi-precision arithmetic opera- 
tions can then be performed on the stored stream of unified 50 
media data based on programmed instructions, where the 
multi-precision arithmetic operations include Boolean, inte- 
ger and floating point mathematical operations. The com- 
ponent fields of unified media data can then be manipulated 
based on programmed instmctions that implement copying, 55 
shifting and re-sizing operations. Multi-precision math- 
ematical operations can also be performed on the stored 
stream of unified media data based on programmed 
instructions, where the mathematical operations including 
finite group, finite field, finite ring and table look-up opera- 60 
tions. Instruction and data pre-fetching are included to fill 
instruction and data pipelines, and memory management 
operations can be performed to retrieve instructions and data 
from external memory. The instructions and data are pref- 
erably stored in instruction and data cache/buffers, in which 65 
buffer storage in the instruction and data cache/buffers is 
dynamically allocated to ensure real-time execution. 
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Other aspects of the invention include a method for 
achieving high bandwidth communications between a gen- 
eral purpose media processor and external devices by pro- 
viding a high bandwidth interface disposed between the 
media processor and the external devices, in which the high 
bandwidth interface comprises at least one uni-directional 
channel pair having an input port and an output port. A 
plurality of media data streams, comprising component 
fields of various sizes, are transmitted and received between 
the media processor and the external devices at a rate that 
sustains substantially peak data throughput at the media 
processor. A method for processing streams of media data is 
also included that provides a bi-directional communications 
fabric for transmitting and receiving at least one stream of 
media data, where the at least one stream of media data 
comprises presentation, transmission and storage informa- 
tion. At least one programmable media processor is provided 
within the communications network for receiving, process- 
ing and transmitting the at least one stream of unified media 
data over the bi-directional communications fabric. 

The general purpose, programmable media processor of 
the invention combines in a single device all of the necessary 
hardware included in the specialized processor combina- 
tions to process and communicate digital media data streams 
in real-time. The general purpose, progranmi able media 
processor is therefore cheaper and more flexible than the 
prior approach to media processing. The general purpose, 
programmable media processor is thus more susceptible to 
incorporation within a massively parallel processing net- 
work of general purpose media processors that enhance the 
ability to provide real-time multi-media communications to 
the masses. 

These features are accomphshed by deploying server 
media processors and client media processors throughout the 
network. Such a network provides a seamless, global media 
super-computer which allows programmers and network 
owners to virtu alize resources. Rather than restrictively 
accessing only the memory space and processing time of a 
local resource, the system allows access to resources 
throughout the network. In small access points such as 
wireless devices, where very Uttle memory and processing 
logic is available due to limited battery life, the system is 
able to draw upon the resources of a homogeneous multi- 
computer system. 

The invention also allows network owners the facility to 
track standards and to deploy new services by broadcasting 
software across the network rather than by instituting costly 
hardware upgrades across the whole network. Broadcasting 
software across the network can be performed at the end of 
an advertisement or other program that is broadcasted 
nationally. Thus, services can be advertised and then trans- 
mitted to new subscribers at the end of the advertisement. 

These and other features and advantages of the invention 
will be apparent upon consideration of the following 
detailed description of the presently preferred embodiments 
of the invention, taken in conjunction with the appended 
drawings, 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a broad band media computer 
employing the general purpose, programmable media pro- 
cessor of the invention; 

FIG. 2 is a block diagram of a global media processor 
employing multiple general purpose media processors 
according to the invention; 

FIG. 3 is an illustration of the digital bandwidth spectrum 
for telecommunications, media and computing communica- 
tions; 
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FIG. 4 is the digital bandwidth spectrum shown in FIG. 3 puter 10 is provided in FIG. 1. The broad band microcom- 
taking into account the bandwidth overhead associated with puter 10 consists essentially of a general purpose media 
compressed video techniques; processor 12. As will be described in more detail below, the 
FIG. 5 is a block diagram of the current specialized general purpose media processor 12 receives, processes and 
processor solution for mass media communication, where s transmits media data streams in a bi-directional manner from 
FIG. 5(a) shows the current distributed system, and FIG. upstream network components to downstream devices. In 
5(6) shows a possible integrated approach; general, media data streams received from upstream net- 
FIG. 6 is a block diagram of two presently preferred work components can comprise any combination of audio, 
general purpose media processors, where FIG. 6(a) shows a video, radio, graphics, encryption, authentication, and net- 
distributed system and FIG. 6(b) shows an integrated media working information. As those skilled in the art will 
processor, appreciate, however, the general purpose media processor 
HG. 7 is a block diagram of the presently preferred 12 is in no way limited to receiving, processing and trans- 
structure of a general purpose, programmable media pro- ^^^^^ ^nly these types of media information. The general 
cesser according 10 the mvenuon; ^^^^^ ^^^^^ processor 12 of the invention is capable of 
FIG. 8 IS a drawing consistmg of visual Illustrations of the ^5 processing any form of digital media informaUon without 
various group operations provided on the media proc^r, departing from the spirit and essential scope of the inven- 
where FIG. 8(a) illustrates the group expand operation, FIG. or r 



8(6) illustrates the group compress or extract operation, FIG. c * r fi 

8(c) illustrates the group deal and shuffle operations, FIG. ^^f^^^ uonnguraiion ^ . . 

S(d) illustrates the group swizzle operation and FIG. 8(e) ■ ^? P^!=,^^^^^ embodiment of the lovcnUon shown m 

illustrates the various group permute operations; 1' ^^^'^ communicated to the media 

FIG. 9 shows the preferred instruction and data sizes for P^^'^f ^ ^^"^ Ideally, unified media 

the general purpose, programmable media processor, where ^^^^ received and transmitted by the general 

FIG. 9(a) is an illustration of the various instruction formats ^"^^^^^ processor 12 over a fiber optic cable network 

available on the general purpose, programmable media ^ described in more detail below, although a 

processor, FIG. 9(6) illustrates the various floating-point fiber optic cable network is preferred, the presently existing 

data sizes available on the general purpose media processor, communications network in the United States consists of a 

and FIG. 9(c) illustrates the various fixedpoint data sizes combination of fiber optic cable, coaxial cable and other 

available on the general purpose media processor; transmission media. Consequently, the general purpose 

FIG, 10 is an illustration of a presently prefen-ed memory ^^^^^ processor 12 can also receive and transmit media data 

management unit included in the general purpose processor ^^^^^^^ ""^^^^ traditional twisted pair 

shown in FIG. 7, where FIG. 10(^) is a translation block ^^^^ connections 16. llie specific communications protocol 

diagram and FIG. 10(6) illustrates the functional blocks of employed over the twisted pair 16, whether POTS, ISDN or 

the transaction lookaside buffer; ADSL, is not essential; all protocols are supported by the 

HG. 11 is an illustration of a super-string pipeline tech- ^'""^^ band microcomputer 10 The details of these protocols 

35 are generally known to those skilled m the art and no further 

TTiA • -11 * .1 f J discussion is therefore needed or provided herein. 
FIG. 12 is an lUustraUon of the presently preferred super- a *u r ^ ^ : 1 • • 
t*ot,„;^..-. r / r r Another form of upstream network communication is 
mpiP^^hncJ^^chm^^c. through a satelUte link 18. Tlie satellite link 18 is typically 
FIG. 13 IS a block diagram of a single memory channel for connected to a satellite receiver 20. The satelHte receiver 20 
communication to the general purpose media processor comprises an antenna, usually in the form of a sateUite dish, 
V!^ and amplification circuitry. The details of such satellite 
FIG. 14 is an illustration of the presenUy preferred con- communications are also generally known in the art, and 
nection of standard memory devices to the preferred fu^her detail is therefore not provided or included herein, 
memory interface; ^ described above, the general purpose media processor 
FIG. 15 is a block diagram of the input/output controller 45 12 communicates in a bi-directional manner to receive, 
for use with the memory channel shown in FIG. 13; process and transmit media data streams to and from down- 
FIG. 16 is a block diagram showing multiple memory stream devices. As shown in FIG. 1, downstream commu- 
channels connected to the general purpose media processor nication preferably takes place in at least two forms. First, 
shown in FIG. 7, where FIG. 16(a) shows a two-channel media data streams can be communicated over a 
implementation and FIG. 16(6) illustrates a twelve-channel 50 bi-directional local network 22. Various types of local net- 
channel embodiment; works 22 are generally known in the art and many different 
FIG. 17 illustrates the presently preferred packet commu- forms exist. The general purpose media processor 12 is 
nications protocol for use over the memory channel shown capable of communicating over any of these local networks 
in FIG. 13; 22 and the particular type of network selected is implemen- 

FIG. 18 shows a multi-processor configuration employing 55 tation specific, 

the general pmpose media processor shown in FIG. 7, where The local network 22 is preferably employed to commu- 

FIG. 18(fl) shows a linear processor configuration, FIG. nicale between the unified processor 12, and audio/visual 

18(6) shows a processor ring configuration, and FIG. 18(c) devices 24 or other digital devices 26. Presently preferred 

shows a two-dimensional processor configuration; and examples of audioA^isual devices 24 include digital cable 

FIG. 19 shows a presently preferred multi-chip imple- television, video-on-demand devices, electronic yellow 

mentation of the general purpose, programmable media pages services, integrated message systems, video 

processor of the invention. telephones, video games and electronic program guides. As 

those skilled in the art will appreciate, other forms of 
audioA^ideo devices are contemplated within the spirit and 
65 scope of the invention. Presently preferred embodiments of 

Referring to the drawings, where like-reference numerals other digital devices 26 for communication with the general 

refer to like elements throughout, a broad band microcom- purpose media processor 12 include personal computers, 
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television sets, work stations, digital video camera The general purpose media processor 12 is operable at 

recorders, and compact disc read-only memories. As those significantly high band widths in order to receive, process 

skilled in the art will also appreciate, further digital devices and transmit unified media data streams. Referring to FIG. 

26 are contemplated for communication to the general 3, the respective frequencies for various types of media data 

purpose media processor 12 without departing from the s streams are set forth against a bandwidth spectrum 60. The 

spirit and scope of the invention. bandwidth spectrum 60 includes three component 

Second, the general purpose media processor preferably spectrums, all along the same range of frequencies, which 

also communicates with downstream devices over a wireless ^^^^^ j^e various frequency rates of digital media com- 

network 28. In the presenUy preferred embodiment of the ^^^i^^tiom, current computing bandwidth capabiliUes are 

invention, wireless devices for communication over the ^ , , j tt- * i • * i. 

. 1 ' , -o • '.u . '30 ^dso displayed. Ine telecommunications spectmm 62 shows 

wireless network 28 can comprise either remote communi- - ji-f 

cation devices 30 or remote compuUng devices 32. PresenUy ^^.^^^^ frequency bands used for telecommunications 

preferred embodiments of the remote communications transmission. For example, teletype terminals and modems 

devices 30 include cordless telephones and personal com- 0?^^^^^ ^ ^ ^^"6^ t^^^ween approximately 64 bits/second to 

municators. Presently preferred embodiments of the remote kilobits/second. The ISDN telecommunication protocol 

computing devices 32 include remote controls and telecom- operates at 64 kilobits/second. At the upper end of the 

municating devices. As those skilled in the art will telecommunications spectrum 62, Ti and T3 trunks operate 

appreciate, other forms of remote communication devices 30 at one megabit per second and 32 megabits per second, 

and remote computing devices 32 are capable of communi- respectively. The SONET frequency range extends from 

cation with the general purpose media processor 12 without approximately 128 megabits per second up to approximately 

departing from the spirit and scope of the invention. An agile 20 32 gigabits per second. Accordingly, in order to carry such 

digital radio (not shown) that incorporates a general purpose broad band communications, the general purpose media 

media processor 12 may be used to communicate with these processor 12 is capable of transferring information at rates 

wireless devices. into the gigabits per second range or higher. 

Network Configuration A spectrum of typical media data streams is presented in 

Referring now to FIG, 2, the general purpose media 25 the media spectrum 64 shown in FIG. 3. Voice and music 

processor 12 is preferably disposed throughout a digital transmissions are centered at frequencies of approximately 

communications network 38. In order to enable communi- 64 kilobits per second and one megabit per second, respec- 

cation among large and small businesses, residential cus- tively. At the upper end of the media spectrum 64, video 

tomers and mobile users, the network 38 can consist of a transmission takes place in a range from 128 megabits per 

combination of many individual sub-networks comprised of 30 second for high density television up to over 256 gigabits per 

three main forms of interconnection. The trunk and main second for movie applications. When using common video 

branches of the network 38 preferably employ fiber optic compression techniques, however, the video transmission 

cable 40 as the preferred means of interconnection. Fiber spectrum can be shifted down to between 32 kilobits per 

optic cable 40 is used to connect between general purpose second to 128 megabits per second as a result of the data 

media processors 12 disposed as network servers 46 or large 35 compression. As described below, the processing required to 

business installations 48 that arc capable of coupling directly achieve the data compression results in an increase in 

to the fiber optic link 40. For communications to small bandwidth requirements. 

business and residential customers that may be incapable of Current computing bandwidths are shown in the comput- 

directly coupling to the fiber optic cable 40, a general ing spectrum 66 of FIG. 3. Serial communications presently 

purpose media processor 12 can be used as an interface to 40 take place in a range between two kilobits per second up to 

other forms of network interconnection. 512 kilobits per second. The Ethernet network protocol 

As shown in FIG. 2, alternate forms of interconnection operates at approximately 8 megabits per second. Current 

consist of coaxial cable lines 42 and twisted pair wiring 44, dynamic random access memory and other digital input/ 

Coaxial cable lines are currently in place throughout the output peripherals operate between 32 megabits per second 

U.S. and is typically employed to provide cable television 45 and 512 megabits per second. Presently available micropro- 

services to residential homes. According to the preferred cessors are capable of operation in the low gigabits per 

embodiment of the invention, general purpose media pro- second range. For example, the '386 Pentium microproces- 

cessors 12 can be installed at these residential locations 52. sor manufactured by Intel Corporation of Santa Clara, Calif. 

In contrast to the specialized processor approach, the general operates in the lower half of that range, and the Alpha 

purpose media processor 12 provides enough bandwidth to 50 microprocessor manufactured by Digital Equipment Corpo- 

allow for bi-directional communications to and from these ration approaches the 16 gigabits per second range, 

residential locations 52. When video compression is employed, as expressed 

Network servers 46 controlled by general purpose media above, the associated processing overhead reduces the effec- 

processois 12 are also employed throughout the network 38. live bandwidth of the particular processor. As a result, in 

For example, the network servers 46 can be used to interface 55 order to handle compressed video, these processors must 

between the fiber optic network 40 and twisted pair wiring operate in the terahertz frequency range. The bandwidth 

44. TWisted pair wiring 44 is still employed for small spectrum 60 shown in FIG. 4 represents the effect of 

businesses 50 and residential locations 52 that do not or handling media data streams including compressed video, 

cannot currently subscribe to coaxial cable or fiber optic The computing spectrum 66 is skewed down to properly 

network services. General purpose media processors 12 are 60 align the computing bandwidth requirements with the tele- 

also disposed at these small business locations 50 and communications spectrum 62 and the media spectrum 64. 

non-cable residential locations 52. General purpose media Accordingly, current processor technology is not sufficient 

processors 12 are also installed in wireless or mobile loca- to handle the transmission and processing associated with 

tions 52, which are coupled to the network 38 through agile complex streams of multi-media data, 

digital radios (not shown). As shown in FIG. 2, network 65 The current specialized processor approach to media 

databases or other peripherals 56 can also coupled to general processing is illustrated in the block diagram shown in FIG. 

purpose media processors 12 in the network 38. 5. As shown in FIG. 5, special purpose processors are 
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coupled to a back plane 70, which is capable of transmitting media data can be processed in parallel by the general 

instructions and data at the upper kilobits to lower gigabits purpose media processor 12. 

per second range. In a typical configuration, an audio Coupled to the multi-precision ALU 102 via the data path 

processor 76, video processor 78, graphics processor 80 and 108, and also an clement of the execution unit 100, is a 

network processor 82 are all coupled to the back plane 70. 5 programmable switch 104. The programmable switch 104 

Each of the audio, video, graphics and network processors performs data handling operations on single or unified media 

76-82 typically employ their own private or dedicated data streams transmitted over the datapath 108. Examples of 

memories 84, which are only accessible to the specific such data handling operations include deals, shuffles, shifts, 

processor and not accessible over the back plane 70. As expands, compresses, swizzles, permutes and reverses, 

described above, however, unless video data streams arc lo although other data handling operations arc contemplated, 

constantly being processed, for example, the video processor Tliese operations can be performed on single bits or bit fields 

78 will sit idle for periods of time. The computing power of consisting of two or more bits up to the entire width of the 

the dedicated video processor 78 is thus only available to data path 108. Thus, single bits or bit fields of various sizes 

handle video data streams and is not available to handle can be manipulated through programmable operation of the 

other media data streams that are directed to other dedicated is switch 104. 

processors. This, of course, is an inefficient use of the video Examples of the presently preferred data manipulation 

processor 78 particularly in view of the overall processing operations performed by the general purpose media proces- 

capability of this multi-processor system. sor 12 are shown in FIG. 8. A group expand operation is 

The general purpose media processor 12, in contrast, visually illustrated in FIG. 8(fl). According to the group 

handles a data stream of audio, video, graphics and network 20 expand operation,- a sequential field of bits 270 can be 

information all at the same time with the same processor. In divided into constituent sub-fields 272£il4 212d for insertion 

order to handle the ever changing combination of data types, into a larger field array 274. The reverse of the group expand 

the general purpose media processor 12 is dynamically operation is a group compress or extract operation. A visual 

partitionable to allocate the appropriate amount of process- illustration of the group compress or extract operation is 

ing for each combination of media in a unified media data 25 shown in FIG. 8(t). As shewn, separate sub-fields 

stream. A block diagram of two preferred general purpose 212a-212d from a larger bit field 274 can be combined to 

media processor system configurations is shown in FIG. 6. form a contiguous or sequential field of bits 270. 

Referring to FIG, 6(a), a general purpose media processor Referring to FIGS. 8(c)-8(e), group deal, shuffle, swizzle 

12 is coupled to a high-speed back plane 90. The presently and permute operations performed by the programmable 

preferred back plane 90 is capable of operation at 30 gigabits 30 switch 104 are also illustrated. The operations performed by 

per second. As those skilled in the art will appreciate, back these instructions are readily understood from a review of 

planes 90 that are capable of operation at 400 gigabits per the drawings. The group manipulation operations illustrated 

second or greater bandwidth are envisioned within the spirit in FIGS. 8(a)-8(e) comprise the presently contemplated data 

and scope of the invention. Multiple memory devices 92 arc manipulation operations for the general purpose media pro- 

also coupled to the back plane 90, which are accessible by 35 cessor 12. As those skilled in the art will appreciate, either 

the general purpose media processor 12. Input/output a subset of these operations or additional data manipulation 

devices 94 are coupled to the back plane 90 through a operations can be incorporated in other alternate embodi- 

dual-ported memory 92. The configuration of the input/ ments of the general purpose media processor 12 without 

output devices 94 on one end of the dual-ported memory 92 departing from the spirit and scope of the invention, 

allows the sharing of these memory devices 92 throughout 40 Referring again to FIG. 7, higher level mathematical 

a network 38 of general purpose media processors 12. operations than those performed by the multi-precision ALU 

Alternatively, FIG. 6(6) shows a presently preferred inte- 102 are performed in the general purpose media processor 

grated general purpose media processor 12. The integrated 12 through an extended naath element 106. The extended 

processor includes on-board memory and I/O 86. The math element 106 is coupled lo the data path 108 and also 

on-board memory is preferably of sufficient size to optimize 45 comprises part of the execution unit 100. The extended math 

throughput, and can comprise a cache and/or buffer memory element 106 performs the complex arithmetic operations 

or the like. The integrated media processor 12 also connects necessary for video data compression and similarly intensive 

to external memory 88, which is preferably larger than the mathematical operations. One presently preferred example 

on-board memory 86 and forms the system main memory. of an extended math operation comprises a Galois field 

Execution Unit 50 operation. Other examples of extended mathematical func- 

One presently preferred embodiment of an integrated tions performed by the extended math element 106 include 
general purpose media processor 12 is shown in FIG. 7. The CRC generation and checking, Reed-Solomon code genera- 
core of the integrated general purpose media processor 12 tion and checking, and spread-spectrum encoding and 
comprises an execution unit 100. Three main elements or decoding. As those skilled in the art appreciate, additional 
subsections are included in the execution unit 100. A mul- 55 mathematical operations are possible and contemplated, 
tiple precision arithmetic/logic unit ("ALU'^ 102 performs According to the preferred embodiment of the integrated 
all logical and simple arithmetic operations on incoming general purpose media processor 12, a register file 110 is 
media data streams. Such operations consist of calculate and provided in addition to the execution unit 100 to process 
control operations such as Boolean functions, as well as media data. The register file 110 stores and transmits data 
addition, subtraction, multiplication and division. These 60 streams to and firom the execution unit 100 via the data path 
operations are performed on single or unified media data 108, Rather than employing a complex set of specific or 
streams transmitted to and from the multiple precision ALU dedicated registers, the general purpose media processor 12 
102 over a data bus or data path 108. Preferably the data path preferably includes 64 general purpose registers in the 
108 is 128 bits wide, although those skilled in the art will register file 110 along with one program counter (not 
appreciate that the data path 108 can take on any width or 65 shown). The 64 general purpose registers contained in the 
size without departing from the spirit and scope of the register file 110 are all available to the user/programmer, and 
invention. Th& wider the data path 108 the more unified comprise a portion of the user state of the general purpose 
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media processor 12. The general purpose registets are pref- 
erably capable of storing any form of data. Each register 
within the register file 110 is coupled to the data path 108 
and is accessible to the execution unit 100 in the same 
manner. Thxis, the user can employ a general purpose 5 
register according to the specific needs of a particular 
program or unique application. As those skilled in the art 
will appreciate, the register file 110 can also comprise a 
plurality of register files 110 configured in parallel in order 
to support parallel multi-threaded processing. lO 
Instruction Set and User Programming 

Control or manipulation of data processed by the general 
purpose media processor 12 is achieved by selected instruc- 
tions programmed by the user. Those skilled in the art will 
appreciate that a great number of programs are possible ]5 
through various sequences of instructions. Particular pro- 
grams can be developed for each unique implementation of 
the general purpose media processor 12. A detailed discus- 
sion of such specific programs is therefore beyond the scope 
of this description. 20 

One presently preferred instruction set for the general 
purpose media processor 12 is included in the Microfiche 
Appendix, the contents of which are hereby incorporated 
herein by reference. A list of the presently preferred major 
operation codes for the general purpose media processor 12 is 
appears below in Table L 
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instruction. As many as 255 separate operations are con- 
templated for the preferred embodiment of the general 
purpose media processor 12. As shown in Table I, however, 
not all of the operation codes arc presently implemented. As 
those skilled in the art will appreciate, alternate schemes for 
organizing the operation codes, as well as additional opera- 
tion codes for the general purpose media processor 12, are 
possible. 

The instructions provided in the instruction set for the 
general purpose media processor 12 control the transfer, 
processing and manipulation of data streams between the 
register file 110 and the execution unit 100. TTie presently 
preferred width of the instruction path 112 is 32-bits wide, 
organized as four eight-bit bytes ("quadlets"). Those skilled 
in the art will appreciate, however, that the instruction path 
112 can take on any width without departing from the spirit 
and scope of the invention. Preferably, each instruction 
within the instruction set is stored or organized in memory 
on four-byte boundaries. The presently preferred format for 
instructions is shown in FIG. 9(fl). 

As shown in FIG. 9(a), each of the prcsendy. preferred 
instruction formats for the general purpose media processor 
12 includes a field 280 for the major operation code number 
shown in Table I. Based on the type of operation performed, 
the remaining bits can provide additional operands accord- 
ing to the type of addressing employed with the operation. 



TABLE I 



MAJOR OPERATION POPES 

MA- . 



JOR 


0 


32 


64 


96 


128 


160 


192 


224 








major operation code fldd values 










0 


ERES 


GSHUFFLEI 


FMUIADD36 


GMULADDl 


LU16LAI 


SAAS641AI 


EADDrO 


BFE16 


1 


ESHUFFLEI4MUX 


GSHUFFLEI4MUX 


FMULADD32 


GMULADD2 


LU16BAI 


SAAS64BAI 


EADDIUO 


BFNUE16 


2 




GSELECTS 


FMULADD64 


GMULADD4 


LU16LI 


SCAS64LAI 


ESETIL 


BFNUGE36 


3 


EMDEPI 


GMDEPI 




GMULADD8 


LU16BI 


SCAS64BAI 


ESETIGE 


BFNUU6 


4 


EMUX 


GMUX 


FMULSUB16 


GMULADDl 6 


LU32LAr 


SMAS64LAI 


ESEllE 


BFE32 


S 


E8MUX 


G8MUX 


FMULSUB32 


GMULADD32 


LU32BAI 


SMAS64BAI 


ESEHNB 


BFNUE32 


6 


EGFMUL64 


GGFMUL8 


FMULSUB64 


GMULADD64 


LU32LI 


SMUX64LAI 


ESEnUL 


BFNUGE32 


7 


ETRANSPOSE8MUX 


GTRANSPOSE8MUX 




GEXTRACn28 


LU32BI 


SMUX64BAI 


ESHnUGE 


BFNUL32 


8 










L16LAI 


S16LAI 


ESUBIO 


BFE64 


9 


ESWIZZLE 


GSWIZZLE 




GUMUUU)D2 


L16BAI 


S16BAI 


ESUBIUO 


BFNUE64 


10 




GSWIZZLECOPY 




GUMULADD4 


U6U 


S16LI 


ESUBIL 


BFNUGE64 


11 




GSWIZZLESWAP 




GUMULADD8 


U6Bt 


S16BI 


ESUBIGE 


BFNUL64 


12 


EDEPI 


GDEPI 


F.16 


GUMULADD16 


U2LAI 


S32LAI 


ESUBIE 


BFE128 


13 


■ EUDEPI 


GUDEPI 


F.32 


GUMULADD32 


L32BAI 


S32BAI 


ESUBINE 


BFNUE128 


14 


EWTHI 


GWTHI 


F.64 


GUMULADD64 
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As shown in Table I, the major operation codes are grouped 
according to the function performed by the operations. The 
operations are thus ananged and listed above according to 
the presently preferred operation code number for each 



65 For example, the remainder of the 32-bit instruction field can 
comprise an immediate operand ("imm"), or operands stored 
in any of the general registers ("ra", "rb", "rc", and "rd"). In 
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addition, minor operation codes 282 can also be included 
among the operands of certain 32-bit instruction formats. 

The presently preferred embodiment of the general pur- 
pose media processor 12 includes a limited instruction set 
similar to those seen in Reduced Instruction Set Computer 
("RISC') systems. The preferred instruction set for the 
general purpose media processor 12 shown in Table I 
includes operations which implement load, store, 
synchronize, branch and gateway functions. These five 
groups of operations can be visually represented as two 
general classes of related operations. The branch and gate- 
way operations perform related functions on media data 
streams and are thus visually represented as block 114 in 
FIG. 7. Similarly, the load, store and synchronize operations 
are grouped together in block 116 and perform similar 
operations on the media data streams. (Blocks 114 and 116 
only represent the above classification of these operations 
and their function in the processing of media data streams, 
and do not indicate any specific underlying electronic 
connections.) A more detailed discussion of these 
operations, and the functionality of the general purpose 
media processor 12, appears in the Microfiche Appendix. 

The four-byte structure of instructions for the general 
purpose media processor 12 is preferably independent of the 
byte ordering used for any data structures. Nevertheless, the 
gateway instructions are specifically defined as 16-byte 
structures containing a code address used to securely invoke 
a procedure at a higher privilege level. Gateways are pref- 
erably marked by protection information specified in the 
translation lookaside buffer 148 in the memory management 
unit 122. Gateways are thus preferably aligned on 16-byte 
boundaries in the external memory. In addition to the general 
purpose registers and program counter, a privilege level 
register is provided within the register file 110 that contains 
the privilege level of the currently executing instruction. 

The instruction set preferably includes load and store 
instructions that move data between memory and the register 
file 110, branch instructions to compare the content of 
registers and transfer control, and arithmetic operations to 
perform computations on the contents of registers. Swap 
instructions provide multi-thread and multi-processor syn- 
chronization. These operations are preferably indivisible and 
include such instmctions as add-and-swap, compare-and- 
swap, and multiplex-and-swap instructions. The fixed-point 
compare-and-branch instructions within the instruction set 
shown in Table I provide the necessary arithmetic tests for 
equality and inequality of signed and unsigned fixed-point 
values. The branch through gateway instruction provides a 
secure means to access code at a higher privileged level in 
a form similar to a high level language procedure call 
generally known in the art. 

The general purpose media processor 12 also preferably 
supports floating-point compare-and-branch instructions. 
The arithmetic operations, which are supported in hardware, 
include floating-point addition, subtraction, multiplication, 
division and square root. The general purpose media pro- 
cessor 12 preferably supports other floating-point operations 
defined by the ANSI-IEEE floating-point standard through 
the use of software libraries. A floating point value can 
preferably be 16, 32, 64 or 128-bits wide. Examples of the 
presenting preferred floating-point data sizes are illustrated 
in HG. 9(fc). 

The general purpose media processor 12 preferably sup- 
ports virtual memory addressing and virtual machine opera- 
tion through a memory management unit 122. Referring to 
FIG. 10(a), one presently preferred embodiment of the 
memory management unit 122 is shown. The memory 
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management unit 122 preferably translates global virtual 
addresses into physical addresses by software program- 
mable routines augmented by a hardware translation looka- 
side buffer ("TLB") 148. A facility for local virtual address 
translation 164 is also preferably provided. As those skilled 
in the art will appreciate, the memory management unit 122 
includes a data cache 166 and a tag cache 168 that store data 
and tags associated with memory sections for each entry in 
the TLB 148. 

A block diagram of one preferred embodiment of the TLB 
148 is shown in FIG. 10(b). The TTB 148 receives a virtual 
address 230 as its input. For each entry in the TLB 148, the 
virtual address 230 is logically AND-ed with a mask 232. 
The output of each respective AND gate 234 is compared via 
a comparator 236 with each entry in the TLB 148. If a match 
is detected, an output from the comparator 236 is used to 
gate data 240 through a transceiver 238. As those skilled in 
the art will appreciate, a match indicates the entry of the 
corresponding physical address within the contents of the 
TLB 148 and no external memory or I/O access is required. 
The data 240 for the data cache 166 (FIG. lO(fl)) is then 
combined with the remaining lower bits of the virtual 
address 230 through an exclusive -OR gate 242. The result- 
ant combination is the physical address 244 output from the 
TLB 148. If a match is not detected between the logical 
address and the contents of the tag cache 168, the memory 
management unit 122 an external memory or I/O access is 
necessary to retrieve the relevant portion of memory and 
update the contents of the TI.B 148 accordingly. 

Using generally known memory management techniques, 
the memory management unit 122 ensures that instructions 
(and data) arc properly retrieved from external memory (or 
other sources) over an external input/output bus 126 (see 
FIG. 7). As described in more detail below, a high bandwidth 
interface 124 is coupled to the external input/output bus 126 
to communicate instructions (and media data streams) to the 
general purpose media processor 12. The presently preferred 
physical address width for the general purpose media pro- 
cessor 12 is eight bytes (64-bits). In addition, the memory 
management unit 122 preferably provides match bits (not 
shown) that allow large memory regions to be assigned a 
single TLB entry allowing for fine grain memory manage- 
ment of large memory sections. The memory management 
unit 122 also preferably includes a priority bit (not shown) 
that allows for preferential queuing of memory areas accord- 
ing to respective levels of priority. Other memory manage- 
ment operations generally known in the art are also per- 
formed by the memory management unit 122. 

Referring again to FIG. 7, instructions received by the 
general purpose media processor 12 are stored in a com- 
bined instruction buffer/cache 118. The instruction buffer/ 
cache 118 is dynamically subdivided to store the largest 
sequence of instmctions capable of execution by the execu- 
tion unit 100 without the necessity of accessing external 
memory. In a preferred embodiment of the invention, 
instruction buffer space is allocated to the smallest and most 
frequently executed blocks of media instructions. The 
instruction buffer thus helps maintain the high bandwidth 
capacity of the general purpose media processor 12 by 
sustaining the number of instructions executed per second at 
or near peak operation. That portion of the iastrucUon 
buffer/cache 118 not used as a buffer is, therefore, available 
to be used as cache memory. The instruction buffer/cache 
118 is coupled to the instmction path 112 and is preferably 
32 kilobytes in size. 

A data buffer/cache 120 is also provided to store data 
transmitted and received to and from the execution unit 100 
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and register file 110. The data buffer/cache 120 is also 
dynamically subdivided in a manner similar to that of the 
instruction buffer/cache 118. The buffer portion of the data 
buffer/cache 120 is optimized to store a set size of unified 
media data capable of execution without the necessity of 
accessing external memory. In a preferred embodiment of 
the invention, data buffer space is allocated to the smallest 
and most frequently accessed working sets of media data. 
Like the instruction buffer, the data buffer thus maintains 
peak bandwidth of the general purpose media processor 12. 
The data buffer/cache 120 is coupled to the data path 108 
and is preferably also 32 kilobytes in size. 

The preferred embodiment of the general purpose media 
processor 12 includes a pipelined instruction pre-fetch struc- 
ture. Although pipelined operation is supported, the general 
purpose media processor 12 also allows for non-pipelined 
operations to execute without any operational penalty. One 
preferred pipeline structure for the general purpose media 
processor 12 comprises a "super-string" pipeline shown in 
FIG. 11. A super-string pipeline is designed to fetch and 
execute several instructions in each clock cycle. The instruc- 
tions available for the general purpose media processor 12 
can be broken down into five basic steps of operation. These 
steps include a register-to-register address calculation, a 
memory load, a register-to-register data calculation, a 
memory store and a branch operation. According to the 
super-string pipeline organization of the general purpose 
media processor 12, one instruction from each of these five 
types may be issued in each clock cycle. The presently 
preferred ordering of these operations are as listed above 
where each of the five steps are assigned letters "A," 
"E," "S" and "B" (sec FIG. 11). 

According to the super-string pipelining technique, each 
of the instructions are serially dependent, as shown in FIG. 
11, and the general purpose media processor 12 has the 
ability to issue a string of dependent instructions in a single 
clock cycle. These instructions shown in FIG. 11 can take 
from two to 6ve cycles of latency to execute, and a branch 
prediction mechanism is preferably used to keep up the 
pipeline filled (described below). Instructions can be 
encoded in unit categories such as address, load, store/sync, 
fixed, float and branch to allow for easy decoding. A similar 
scheme is employed to pre-fetch data for the general purpose 
media processor 12. 

As those skilled in the art will appreciate, the super-string 
pipeline can be implemented in a multi-threaded environ- 
ment. In such an implementation, the number of threads is 
preferably relatively prime with njspect to fimctional unit 
rates so that functional units can be scheduled in a non- 
interfering fashion between each thread. 

In another more preferred embodiment, a "super-spring" 
pipelining scheme is employed with the general purpose 
media processor 12. The super-spring pipeline technique 
breaks the super-string pipeline shown in FIG. 11 into two 
sections that are coupled via a memory buffer (not shown). 
A visual representation of the super-spring pipeline tech- 
nique is shown in FIG. 12. The front of the pipeline 204, in 
which address calculation (A), memory load (L), and branch 
(B) operations are handled, is decoupled from the back of 
the pipeline 206, in which data calculation (E) and memory 
store (S) operations are handled. The decoupling is accom- 
plished through the memory buffer (not shown), which is 
preferably organized in a first-in-first-out ("FIFO") fast/ 
dense structure. (The memory buffer is functionally repre- 
sented as a spring in FIG. 12.) 

As indicated in Table I above, the general purpose media 
processor 12 does not include delayed branch instructions, 
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and so relies upon branch or fetch prediction techniques to 
keep the pipeline full in program flows aroimd unconditional 
and conditional branch insUnctions. Many such techniques 
are generally known in the art. Examples of some presently 
preferred techniques include the use of group compare and 
set, and multiplex operations to eliminate unpredictable 
branches; the use of short forward branches, which cause 
pipeline neutralization; and where branch and link predicts 
the return address in a one or more entry stack. In addition, 
the specialized gateway instructions included in the general 
piupose media processor 12 allow for branches to and from 
protected virtual memory space. The gateway instructions, 
therefore, allow an eflGcient means to transfer between 
various levels of privilege. 

As described above, two basic forms of media data are 
processed by the general purpose media processor 12, as 
shown in FIG. 7. These data streams generally comprise 
Nyquisl sampled I/O 128, and standard memory and I/O 
130. As shown in FIG. 7, audio 132, video 134, radio 136, 
network 138, tape 140 and disc 142 data streams comprise 
some examples of digitaUy sampled I/O 128. As those 
skilled in the art wiU appreciate, other forms of digitally 
sampled I/O are contemplated for processing by the general 
purpose media processor 12 without departing from the 
spirit and scope of the invention. Standard memory and I/O 
130 comprises data received and transmitted to and from 
general digital peripheral devices used in the design of most 
computer systems. As shown in FIG. 7, some examples of 
such devices include dynamic random access memory 
("DRAM") 146, or any data received over the PCI bus 144 
generally known in the art. Other forms of standard memory 
and 1/0 sources are also contemplated. The various fixed- 
point data sizes preferred for the general purpose media 
processor 12 are illustrated in FIG. 9(c). 
External Interface 

As mentioned above, the general purpose media processor 
12 includes a high bandwidth interface 124 to communicate 
with external memory and input/output sources. As part of 
the high bandwidth interface 124, the general purpose media 
processor 12 integrates several fast communication channels 
156 (FIG. 13) to communicate externally. These fast com- 
munication channels 156 preferably couple to external 
caches 150, which serve as a buffer to memory interfaces 
152 coupled to standard memory 154. The caches 150 
preferably comprise synchronous static random access 
memory ("SRAM"), each of which are sixty-four kilobytes 
in size; and the standard memories 154 comprise DRAM's. 
The memory interfaces 152 transmit data between the 
caches 150 and the standard memories 154. The standard 
memories 154 together form the main external memory for 
the general purpose media processor 12. The cache 150, 
memory interface 152, standard memory 154 and input/ 
output channel 156 therefore . make up a single external 
memory unit 158 for the general purpose media processor 
12. 

According to the presently preferred embodiment of the 
invention, the memory interface protocol embeds read and 
write operations to a single memory space into packets 
containing command, address, data and acknowledgment 
information. The packets preferably include check codes 
that will detect single -bit transmission errors and some 
multiple-bit errors. As many as eight operations may be in 
progress at a time in each external memory unit 158, As 
shown in FIG. 13, up to four external memory units 158 may 
be cascaded together to expand the memory available to the 
general purpose media processor 12, and to improve the 
bandwidth of the external memory, Hirough such cascaded 
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memory units 158, the memory interface 152 provides for 
the direct connection of multiple banks of standard memory 
154 to maintain operation of the general purpose media 
processor 12 at sustained peak bandwidths. 

According to one embodiment shown in FIG. 13, up to s 
four standard memory devices 154 can be coupled to each 
memory interface 152. Each standard memory 154 thus 
includes as many as four banks of DRAM, each of which is 
preferably sixteen bits wide. The standard memories 154 are 
connected in parallel to the memory interface 152 forming lo 
a 72-bit wide data bus 160, where 64 bits are preferably 
provided for data transfer and eight bits are provided for 
error correction. In addition to the data bus 160, an address/ 
control bus 162 is coupled between the memory interface 
152 and each standard memory 154, The address/control bus 15 
162 preferably comprises at least twelve address lines (4 
kiIobitsxl6 memory size) and four control lines as shown in 
FIG. 13. An alternate manner for coupling the DRAM's to 
the memory interface 152 is illustrated in FIG. 14. As shown 
in FIG. 14, two banks of four DRAM single in-line memory 20 
modules are coupled in parallel to the memory interface 152. 
The memory interface 152 also supports interleaving to 
enhance bandwidth, and page mode accesses to improve 
latency for localized addressing. 

Using standard DRAM components, the external memory 25 
units 158 achieve bandwidths of approximately two 
gigabits/second with the standard memories 154, When four 
such external memory units 158 are coupled via the com- 
munication channel 156, therefore, the total bandwidth of 
the external main memory system increases to one gigabyte/ 30 
second. As discussed further below, in implementations with 
two or eight communication channels 156, the aggregate 
bandwidth increases to two and eight gigabytes/second, 
respectively. 

A more detailed depiction of the communication channel 35 
156 circuitry appears in FIG. 15. According to the preferred 
embodiment of the invention, each communication channel 
156 comprises two unidirectional, byte-wide, difiFerentiai, 
packet-oriented data channels 156fl, lS6b (see FIG. 13). As 
explained above, where memory units 158 are cascaded 40 
together in series, the output of one memory unit 158 is 
connected to the input of another memory unit 158. The two 
unidirectional channels are thus connected through the 
memory units 158 forming a loop structure and make up a 
single bi-directional memory interface channel. 45 

Referring to FIG. 15, each communication channel 156 is 
preferably eight bits wide, and each bit is transmitted 
differentially. For example, output transceiver 170 for bit 
Dqom, transmits both Dq and/Do signals over the communi- 
cation charmel 156. Additional transceivers are similarly 50 
provided for the remaining bits in the channel 156. (The 
transceiver 176 for bit D^^^t associated differential lines 
178, 180 are shown in FIG. 15.) A CLK^, transceiver 182 
is also provided to generate differential clock outputs 184, 
186 over the channel 156. To complete the link between 55 
memory units 158, input transceivers 188-192 are provided 
in each memory unit 158 for each of the differential bits and 
clock signals transmitted over the communication channel 
156, These input signals 172, 174, 178, 180, 184, 186 are 
preferably transmitted through input buffers 194-198 to 60 
other parts of the memory unit 158 (described above). 

Each memory unit 158 also includes a skew calibrator 200 
and phase locked loop ("PLL") 202, The skew calibrator 200 
is used to control skew in signals output to the communi- 
cation channel 156. Preferably, digital skew fields are 65 
employed, which include set numbers of delay stages to be 
inserted in the output path of the communication channel 
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156. Setting these fields, and the corresponding analog skew 
fields, permits a fme level of control over the relative skew 
between output channel signals. 

The PLL 202 recovers the clock signal on either side of 
the communication channel 156 and is thus provided to 
remove clock jitter. The clock signals 184, 186 preferably 
comprise a single phase, constant rate clock signal. The 
clock signals 184, 186 thus contain alternating zero and one 
values transmitted with the same timing as the data signals 
172, 174, 178, 180. The clock signal frequency is, therefore, 
one-half the byte data rate. The communication channel 156 
preferably operates at constant frequency and contains no 
auxiliary control, handshaking or flow control information. 

Each external memory unit 158 preferably defines two 
functional regions: a memory region, implemented by the 
cache 150 backed by standard memory 154 (see FIG. 13), 
and a configuration region, implemented by registers (not 
shown). Both regions are accessed by separate interfaces; 
the communication channel 156 is used to access the 
memory region, and a serial interface (described below) is 
used to access the configuration region. In the memory 
region, the caches 150 are preferably write-back (write-in) 
single-set (direct-map) caches for data originally contained 
in standard memory 154. All accesses to memory space 
should maintain consistency between the contents of the 
cache 150 and the contents of the standard memory 154. The 
configuration region registers provide the mechanism to 
detect and adjust skew in the communication channel 156. 
Software is preferably employed to adaptively adjust the 
skew in the channel 156 through digital skew fields, as 
explained above. The serial interface thus is used to con- 
figure the external memory units 158, set diagnostic modes 
and read diagnostic information, and to enable the use of a 
high-speed tester (not shown). 

One presently preferred embodiment of the invention 
employs two byte-wide packet communication channels 156 
(FIG. 16(d), In order to further increase the bandwidth of the 
general purpose media processor 12, up to sixteen byte-wide 
packet communication channels 156 can be employed. 
Referring to FIG. 16(b\ twelve communication channels, 
comprising eight memory channels 210, a ninth channel for 
parallel processing 212 (described below), and three input/ 
output ("I/O^') channels 214, are shown. Each of the com- 
munication channels 210-214 preferably employs the cas- 
cade configuration of four channel interface devices 216. 
(Each chaimel interface device 216 coupled to the memory 
channels 210 corresponds to die external memory unit 158 
shown in FIG. 13.) Through each of the twelve communi- 
cation channels shown in FIG. 16(t), the general purpose 
media processor 12 can request or issue read or write 
transactions. When not interleaved, the twelve channels 
provide a single contiguous memory space for each channel 
interface device 216. 

Alternatively, memory accesses may be interleaved in 
order to provide for continuous access to the external 
memory system at the maximum bandwidth for the DRAM 
memories. In an interleaved configuration, at any point in 
time some memory devices will be engaged in row pre- 
charge, while others may be driving or receiving data, or 
receiving row or column addresses. The memory interface 
152 (FIG. 13) thus preferably maps between a contiguous 
address space and each of the separate address spaces made 
available within each external memory unit 158. For maxi- 
mum performance, therefore, the memory interface is inter- 
leaved so that references to adjacent addresses are handled 
by different memory devices. Moreover, in the preferred 
embodiment, additional memory operations may be 
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requested before the corresponding DRAM bank is avail- 
able. Iq an interleaved approach, these operations are placed 
in a queue until they can be processed. According to the 
preferred embodiment, memory writes have lower priority 
than memory reads, unless an attempt is made to read an 5 
address that is queued for a write operation. As those skilled 
in the art will appreciate, the depth of the memory write 
queue is dictated by the specific implementation. 

Although up to four external memory units 158 are 
preferably cascaded to form effectively larger memories, lo 
some amount of latency may be introduced by the cascade. 
Packets of data transmitted over the communication channel 
156 are uniquely addressed to a particular channel interface 
device 216. A packet received at a particular device, which 
specifies another module address, is automatically passed to 15 
the correct channel interface device 216. Unless the module 
address matches a particular device 216, that packet simply 
passes from the input to the output of the interface device 
216. This mechanism divides the serial interconnection of 
interface devices 216 into strings, which function as a single 20 
larger memory or peripheral, but with possibly longer 
response latency. 

In addition to the memory channels 210, the general 
purpose media processor 12 provides several communica- 
tion channels 214 for communication with external input/ 25 
output devices. Referring to FIG. 16(6), three input/output 
channels 214 having SRAM buffered memory (see FIG. 13) 
provide an interface to external standard I/O devices (not 
shown). Like the eight memory channels 210, the three I/O 
channels 214 are byte -wide input/output channels intended 30 
to operate at rates of at least one gigahertz. The three I/O 
channels 214 also operate as a packet communication link to 
synchronous SRAM memory 208 within the channel inter- 
face device 216. A controller 226 within the channel inter- 
face device 216 completes the interface to the I/O devices. 35 

The three I/O channels 214 preferably function in like 
manner to the memory channels 210 described above. The 
interface protocol for the three I/O channels 214 divides read 
and write operations to a single memory space into packets 
containing command, address, data and acknowledgment 40 
information. The packets also include a check code that will 
detect single-bit transmission errors and some multiple-bit 
errors. According to the preferred embodiment of the 
invention, as many as eight operations may progress in each 
interface device 216 at a time. As shown in FIG. 16(b), up 45 
to four channel interface devices 216 can be cascaded 
together to expand the bandwidth in the three I/O channels 
214. A bit-serial interface (not shown) is also provided to 
each of the channel interface devices 216 to allow access to 
configuration, diagnostic and tester information at standard 50 
TTL signal levels at a more moderate data rate. (A more 
detailed description of the serial interface is provided 
below). 

Like the memory channels 210, each I/O channel 214 
includes nine signals — one clock signal and eight data 55 
signals. Differential voltage levels are preferably employed 
for each signal. Each channel interface device 216 is pref- 
erably terminated in a nominal 50 ohm impedance to 
ground. This impedance applies for both inputs and outputs 
to the communication channel 156. A programmable termi- 60 
nation impedance is preferred. 
Interface Communication 

According to one presently preferred embodiment of the 
invention, the channel interface devices 216 can operate as 
either master devices or slave devices. A master device is 65 
capable of generating a request on the communication 
channel 156 and receiving responses from the communica- 



,321 

22 

tion channel 156. Slave devices are capable of receiving 
requests and generating responses, over the communication 
channel 156. A master device is preferably capable of 
generating a constant frequency clock signal and accepting 
signals at the same clock frequency over the communication 
channel 156. A slave device, therefore, should operate at the 
same clock rate as the communication channel 156, and 
generate no more than a specified amount of variation in 
output clock phase relative to input clock phase. The master 
device, however, can accept an arbitrary input clock phase 
and tolerates a specified amount of variation in clock phase 
over operating conditions. 

Packets of information sent over the communication 
channel 156 preferably contain control commands, such as 
read or write operations, along with addresses and associated 
data. Other commands are provided to indicate error con- 
ditions and responses to the above commands. When the 
communication channel 156 is idle, such as during initial- 
ization and between transmitted packets, an idle packet, 
consisting of an all-zero byte and an all- one byte is 
transmitted through the communication channel 156. Each 
non-idle packet consists of two bytes or a multiple of two 
bytes, and begins with a byte having a value other than all 
zeros. All packets transmitted over the communication chan- 
nel 156 also begin during a clock period in which the clock 
signal is zero, and all packets preferably end during a clock 
period in which the clock signal is one. A depiction of the 
preferred packet protocol format for transmission over the 
communication channel 156 appears in FIG. 17. 

The general form of each packet is an array of bytes 
preferably without a specific byte ordering. The first byte 
contains a module address 250 ("ma") in the high order two 
bits; a packet identifier, usually a command 252 ("com"), in 
the next three bit positions; and a link identification number 
254 ("lid") in the last three bit positions. The interpretation 
of the remaining bytes of a packet depend upon the contents 
of the packet identifier. The length of each packet is pref- 
erably implied by the command specified in the initial byte 
of the packet. A check byte is provided and computed as odd 
bit- wise parity with a leftward circular rotation after accu- 
mulating each byte. This technique provides detection of all 
single-bit and some multiple-bit errors, but no correction is 
provided. 

The modular address 250 field of each packet is prefer- 
ably a two-bit field and allows for as many as four slave 
devices to be operated from a single communication channel 
156. Module address values can be assigned in one of two 
fashions: either dynamically assigned through a configura- 
tion register (not shown), or assigned via static/geometric 
configuration pins. Dynamic assignment through a configu- 
ration register is the presently preferred method for assign- 
ing module address values. 

The link identification number 254 field is preferably 
3-bits wide and provides the opportunity for master devices 
to initiate as many as eight independent operations at any 
one time to each slave device. Each outstanding operation 
reqiiires a distinct link identification number, but no ordering 
of operations should be implied by the value of the link 
identification field. Thus, there is preferably no requirement 
for link identification values 254 to be sequentially assigned 
either in requests or responses. 

The receipt of packets over the communication chaimel 
156 that do not conform to the channel protocol preferably 
generates an error condition. As those skilled in the art will 
appreciate, the level or degrees to which a specific imple- 
mentation detects errors is defined by the user. In one 
presently preferred embodiment of the invention, all errors 
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are detected, and the following protocol is employed for 
handling errors. For each error detected, the channel inter- 
face device 216 causes a response explicitly indicating the 
error condition. Channel interface devices 216 reporting an 
invalid packet will then suppress the receipt of additional s 
packets until the error is cleared. The transmitted packet is 
otherwise ignored. However, even though the erroneous 
packet is ignored, the channel interface devices 216 prefer- 
ably continue to process valid packets that have already been 
received and generate responses thereto. An identification of 
the presently preferred commands 252 to be used over the 
communication channel 156 are listed in FIG. 17. 

In the master/slave preferred embodiment, the channel 
interface devices 216 forward packets that are intended for 
other devices connected to the communication channel 156, 
as described above. In slave devices, forwarding is per- 
formed based on the module address 250 field of the packet. 
Packets which contain a module address 250 other than that 
of the current device are forwarded on to the next device. All 
non-idle packets are thus forwarded including error packets. 
In master devices, forwarding is performed based on the link 20 
identifier number 254 of the packet. Packets that contain link 
identifier numbers 254 not generated by the specific channel 
interface device 216 arc forwarded. In order to reduce 
transmission latency, a packet buffer may be provided. As 
those skilled in the art appreciate, the suitable size for the 25 
packet buffer depends on the amount of latency tolerable in 
a particular implementation. 

A variety of master/slave ring configurations are possible 
using the high bandwidth interface 124 of the invention. 
Five ring configurations are currently preferred: single- 30 
master, dual-master, multiple-master, single-slave and 
multiple-master/multiple-slave. The simplest ring configu- 
ration contains a single non-forwarding master device and a 
single non-forwarding slave device. No forwarding is 
required for either device in this configuration as packets are 35 
sent directly to the recipient. A single-master ring, however, 
may contain a cascade of up to four slave devices (see FIGS. 
13, 16). In the single-master ring configuration, each slave 
device is configured to a distinct module address, and each 
slave device forwards packets that contain module address 40 
fields unequal to their own. As discussed above, a single- 
master ring provides a larger memory or I/O capacity than 
a master-slave pair, but also introduces a potentially longer 
response latency. In the single -master ring, each slave device 
may have as many as eight transactions outstanding at any 45 
time, as described above. 

The remaining combinations share many of the above 
basic attributes. In a dual-master pair, each master device 
may initiate read and write operations addressed to the other, 
and each may have up to eight such transactions outstanding, 50 
No forwarding is required for either device because packets 
are sent directly to the recipient. A multiple -master ring may 
contain multiple master devices and a single slave device. In 
this configuration, the slave device need not forward packets 
as all input packets are designated for the single slave ss 
device, A multiple-master ring may contain multiple master 
devices and as many as four slave devices. Each slave device 
may have up to eight transactions outstanding, and each 
master device may use some of those U-ansactions. In a 
preferred embodiment, a master also has the capability to 60 
detect a time-out condition or when a response to a request 
packet is not received. Further aspects of interprocessor 
communications and configurations are discussed below in 
connection with FIG; 18. 

Serial Bus 65 

In one preferred embodiment of the invention, the general 
purpose media processor 12 includes a serial bus (not 
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shown). The serial bus is designed to provide bootstrap 
resources, configuration, and diagnostic support to the gen- 
eral purpose media processor 12. The serial bus preferably 
employs two signals, both at TTL levels, for direct commu- 
nication among many devices. In the preferred embodiment, 
the first signal is a continuously running clock, and the 
second signal is an open-collector bi-directional data signal. 
Four additional signals provide geographic addresses for 
each device coupled to the serial bus. A gateway protocol, 
and optional configurable addressing, each provide a means 
to extend the serial bus to other buses and devices. Although 
the serial bus is designed for implementation in a system 
having a general purpose media processor 12, as those 
skilled in the art will appreciate, the serial bus is applicable 
to other systems as well. 

Because the serial bus is preferably used for the initial 
bootstrap program load of the general purpose media pro- 
cessor 12, the bootstrap ROM is coupled to the serial bus. As 
a result, the serial bus needs to be operational for the first 
instruction fetch. The serial bus protocol is therefore devised 
so that no transactions are required for initial bus configu- 
ration or bus address assignment. 

According to the preferred embodiment, the clock signal 
comprises a continuously running clock signal at a minimum 
of 20 megahertz. The amount of skew, if any, in the clock 
signal between any two serial bus devices should be limited 
to be less than the skew on the data signal. Preferably, the 
serial data signal is a non-inverted open collector 
bi-directional data signal. TL levels are preferred for com- 
munication on the serial bus, and several termination net- 
works may be employed for the serial data signal. A simple 
preferred termination network employs a resistive pull-up of 
220 ohms to 3.3 volts above V^^. An alternate embodiment 
employs a more complex termination network such as a 
termination network including diodes or the "Forced Perfect 
Termination" network proposed for the SCSI-2 standard, 
which may be advantageous for larger configurations. 

The geographic addressing employed in the serial bus is 
provided to insure that each device is addressable with a 
number that is unique among all devices on the bus and 
which also preferably reflects the physical location of the 
device. Thus, the address of each device remains the same 
each time the system is operated. In one preferred 
embodiment, the geographic address is composed of four 
bits, thus allowing for up to 16 devices. In order to extend 
the geographic addressing to more than 16 devices, addi- 
tional signals may be employed such as a buffered copy of 
the clock signal or an inverted copy of the clock signal (or 
both). 

The serial bus preferably incorporates both a bit level and 
packet protocol. The bit level protocol allows any device to 
transmit one bit of information on the bus, which is received 
by all devices on the bus at the same time. Each transmitted 
bit begins at the rising edge of the clock signal and ends at 
the next rising edge. The transmitted bit value is sampled at 
the next rising edge of the clock signal. According to one 
preferred embodiment where the serial data signal is an open 
collector signal, the transmission of a zero bit value on the 
bus is achieved by driving the serial data signal to a logical 
low value. In this embodiment, the transmission of a one bit 
value is achieved by releasing the serial data signal to obtain 
a logical high value. If more than one device attempts to 
transmit a value on the same clock, the resulting value is a 
zero if any device transmits a zero value, and one if all 
devices transmit a one value. This provides a "wired- AND" 
collision mechanism, as those skilled in the art will appre- 
ciate. If two or more devices transmit the same value on the 
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same clock cycle, however, no device can detect the occur- transponder daemons in each processor. In an alternate 

rence of a collision. In such cases, the transaction, which embodiment shown in FIG. 18(6), the inter-processor links 

may occur frequently in some implementations, preferably 218 may be used to join the general purpose media proces- 

procceds as described below. sors 12 in a ring configuration. Alternatively still, general 

The packet protocol employed with the serial bus uses the 5 purpose media processors 12 may be interconnected into a 
bit level protocol to transmit information in units of eight two-dimensional network of i)roccssors of arbitrary size, as 
bits or multiples of eight bits. Each packet transmission shown in FIG. 18(c). Sixteen processors are connected in 
preferably begins with a start bit in which the serial data FIG. 18(c) by connecting four ring networks. In yet another 
signal has a zero (driven) value. After transmitting the eight alternate embodiment, by connecting the inter-processor 
data bits, a parity bit is transmitted. The transmission con- lO links 218 to external switching devices (not shown), multi- 
tinues with additional data. A single one (released) bit is processors with a large number of processors can be con- 
transmitted immediately following the least significant bit of structed with an arbitrary interconnection topology, 
each byte signaling the end of the byte. The requester, responder and transponder daemons pref- 

On the cycle following the transmission of the parity bit, erably handle all inter-processor operations. When one gen- 

any device may demand a delay of two cycles to process the 15 eral purpose media processor 12 attempts a load or store to 

data received. The two cycle delay is initiated by driving the a physical address of a remote processor, the requester 

serial data signal (to a zero value) and releasing the serial daemon autonomously attempts to satisfy the remote 

data signal on the next cycle. Before releasing the serial data memory reference by communicating with the external 

signal, however, it is preferable to insure that the signal is device. The external device may comprise another processor 

not being driven by any other device. Further delays are 20 12 or a switching device (not shown) that eventually reaches 

available by repeating this pattern. another processor 12. Preferably, two requester daemons are 

In order to avoid collisions, a device is not permitted to provided each processor 12, which act concurrently on two 

start a transmission over the serial bus unless there are no different byte channels and/or module addresses. The 

currently executing transactions. To resolve collisions that responder daemon accepts writes from a specified channel 

may occur if two devices begin transmission on the same 25 and module address, which enables an external device to 

cycle, each transmitting device should preferably monitor generate transaction requests in local memory or to generate 

the bus during the transmission of one (released) bits. If any processor events. The responder daemon also generates link 

of the bits of the byte are received as zero when transmitting level writes to the same external device that communicated 

a one, the device has lost arbitration and mu.st cease trans- responses for the received transaction request. Two such 

mission of any additional bits of the current byte or trans- 30 responder daemons are preferably provided; each of which 

action. , operate concurrently to two different byte channels and/or 

According to the preferred embodiment of the invention, module addresses, 
a serial bus transaction consists of the transmission of a The transponder daemon accepts writes from a specified 
seriesof packets. The transaction begins with a transmission channel and module address, which enable an external 
by the transaction initiator, which specifies the target 35 device to cause a requester daemon to generate a request on 
network, device, length, type and payload of the transaction another channel and module address. Preferably, two such 
request. The transaction terminates with a packet having a transponder daemons are provided, each of which act con- 
type field in a specified range. As a result, all devices currently (back- to-back) between two different byte channel 
connected to the serial bus should monitor the serial data and/or module addresses. As those skilled in the art will 
signal to determine when transactions begin and end. A 40 appreciate, the requester, re^onder and transponder dae- 
serial bus network may have multiple simultaneous trans- mons must act cooperatively to avoid deadlock that may 
actions occurring, however, so long as the target and initiator arise due to an imbalance of requests in the system. Dead- 
network addresses are all disjoint. locks prevent responses from being routed to their 
Parallel Processing destinations, which may defeat the benefits of a multi- 

In one preferred embodiment of the invention, two or 45 processor distributed system, 

more general purpose media processors 12 can be linked According to one presently preferred embodiment of the 

together to achieve a multiple processor system. According invention, the general purpose media processor 12 can be 

to this embodiment, general purpose media processors 12 implemented as one or more integrated circuit chips. Refer- 

are linked together using their high bandwidth interface ring to FIG. 19, the presently preferred embodiment of the 

channels 124, either directly or through external switching 50 general purpose media processor 12 consists of a four-chip 

components (not shown). The dual-master pair configuration set. In the four-chip set, a general purpose media processor 

described above can thus be extended for use in multiple- 12 is manufactured as a stand alone integrated circuit. The 

master ring configurations. Preferably, internal daemons stand alone integrated circuit includes a memory manage - 

provide for the generation of memory references to remote ment unit 122, instruction and data cache/buffers 118, 120, 

processors, accesses to local physical memory space, and the 55 and an execution unit 100. A plurality of signal input/output 

transport of remote references to other remote processors. In pads 260 are provided around the circumference of the 

a multi-processor environment, all general purpose media integrated circuit to communicate signals to and from the 

processors 12 run off of a common clock frequency, as general purpose media processor 12 in a manner generally 

required by the communication chaimels 156 that connect known in the art. 

between processors. 60 The second and third chips of the four-chip set comprise 
Referring to FIG. 18, each general purpose media pro- in an external memory element 158 and a chaimel interface 
cessor 12 preferably includes at least a pair of inter- device 216. The external memory element 158 includes an 
processor links 218 (see also FIG. 16(6)). In one interface to the communication channel 156, a cache 150 
configuration, both pairs of inter-processor links 218 can be and a memory interface 152, The channel interface device 
connected between the two processors 12 to further enhance 65 216 also includes an interface to the communication channel 
bandwidth. As shown in FIG. 18(a) several processors 12 156, as well as buffer memory 262, and input/output inter- 
may be interconnected in a linear network employing the faces 264. Both the external memory element 158 and the 
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channel interface device 216 include a plurality of input/ 
output signal pads 260 to communicate signals to and from 
these devices in a generally known manner. 

The fourth integrated circuit chip comprises a switch 226, 
which allows for installation of the general purpose media 5 
processor 12 in the heterogeneous network 38. In addition to 
the plurality of input/output pads 260, the switch 226 
includes an interface to the communication channel 156. The 
switch 226 also preferably includes a buffer 262, a router 
266, and a switch interface 268. lO 

As those skilled in the art will appreciate, many imple- 
mentations for the general purpose media processor 12 are 
possible in addition to the four-chip implementation 
described above. Rather than an integrated approach, the 
general purpose media processor can be implemented in a 15 
discrete manner. Alternatively, the general purpose media 
processor 12 can be implemented in a single integrated 
circuit, or in an implementation with fewer than four inte- 
grated circuit chips. Other combinations and permutations of 
these implementations are contemplated. 

There has been described a system for processing streams 
of media data at substantially peak rates to allow for real 
time communication over a large heterogeneous network. 
The system includes a media processor at its core that is 
capable of processing such media data streams. The hetero- 
geneous network consists of, for example, the fiber optic/ 
coaxial cable/twisted wire network in place throughout the 
U.S. To provide for such communication of media data, a 
media processor according to the invention is disposed at 
various locations throughout the heterogeneous network. 30 
TTie media processor would thus function both in a server 
capacity and at an end user site within the network. 
Examples of such end user sites include televisions, set-top 
converter boxes, facsimile machines, wireless and cellular 
telephones, as well as large and small business and industrial 35 
applications. 

To achieve such high rates of data throughput, the media 
processor includes an execution unit, high bandwidth 
interface, memory management unit, and pipelined instruc- 
tion and data paths. The high bandwidth interface includes 40 
a mechanism for transmitting media data streams to and 
from the media processor at rates at or above the gigahertz 
frequency range. The media data stream can consist of 
transmission, presentation and storage type data transmitted 
alone or in a unified manner Examples of such data types 
include audio, video, radio, network and digital communi- 
cations. According to the invention, the media processor is 
dynamically partitionable to process any combination or 
permutation of these data types in any size. 

A programmable, general purpose media processor sys- 
tem presents significant advantages over current multimedia 
communications. Rather than rigid, costly and inefl&cient 
specialized processors, the media processor provides a gen- 
eral purpose instruction set to ease programm ability in a 
single device that is capable of performing all of the opera- 
tions of the specialized processor combination. Providing a 
uniform instmction set for all media related operations 
eliminates the need for a programmer to learn several 
different instruction sets, each for a different specialized 
processor. The complexity of programming the specialized 
processors to work together and communicate with one 
another is also greatly reduced. The unified instruction set is 
also more efiBcient. Highly specialized general calculation 
instructions that are tailored to general or special types of 
calculations rather than enhancing communication are elimi- 65 
nated. 
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Moreover, the media processor system can be easily 
reprogrammed simply by transmitting or downloading new 
software over the network. In the specialized processor 
approach, new programming usually requires the delivery 
and installation of new hardware, Reprogramming the media 
processor can be done electronically, which of course is 
quicker and less costly than the replacement of hardware. 

It is to be understood that a wide range of changes and 
modifications to the embodiments described above will be 
apparent to those skilled in the art and are contemplated. It 
is therefore intended that the foregoing detailed description 
be regarded as illustrative rather than hmiting, and that it be 
understood that it is the following claims, including all 
equivalents, that are intended to define the spirit and scope 
of this invention. 

We claim: 

1. A system for unified media processing comprising: 

a plurality of general purpose media processors, each 
media processor being operable at sustained peak data 
rates and having a dynamically partitioned execution 
imit, wherein a plurality of media data streams are 
concurrently transmitted over a single data path and are 
dynamically partitioned according to an elemental 
symbol width that is equal to or narrower than the data 
path, and having a high bandwidth interface, the high 
bandwidth interface coupled to external memory and 
input/output elements to receive and transmit data to 
the media processor at substantially peak rates; and 

a bi-directional commxmication fabric, the plurality of 
media processors coupled to the bi-directional commu- 
nication fabric to transmit and receive at least one 
media stream comprising presentation, transmission, 
and storage media information; and 

wherein each media processor further comprises dedi- 
cated memory and wherein the each of the plurality of 
media processors can employ any unu.sed portion of the 
dedicated memory of another media processor in a 
shared manner to efl&ciently store and retrieve 
presentation, transmission and storage media informa- 
tion at substantially peak data rates. 

2. The system defined in claim 1, wherein the 
bi-directional communication fabric comprises a fiber optic 
network, 

3. The system defined in claim 1, wherein the 
bi-directional communication fabric comprises an heteroge- 
neous network. 

4. The system defined in claim 1, wherein the 
bi-directional communication fabric comprises a coaxial 
cable network. 

5. The system defined in claim 1, wherein the 
bi-directional communication fabric comprises a wireless 
network. 

6. The system defined in claim 1, wherein a subset of the 
plurality of media processors comprise network servers. 

7. The system defined in claim 1, wherein the plurality of 
media processors are programmable by downloading pro- 
gram information over the bi-directional communication 
fabric, 

8. The system defined in claim 1, wherein the plurality of 
media processors can access an idle execution unit of 
another media processor in a shared manner to cfiSciently 
process presentation, transmission and storage media infor- 
mation at ^bstantially peak data rates. 
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